Linear Regression

# Load the dataset "parkinsons_updrs"
Parkinsons = read.csv("~/Library/Mobile Documents/com~apple~CloudDocs/Desktop/Univ Miami/6th Semester - SPRING 2022/CSC 597/Assignments/Datasets/Regression/parkinsons_updrs.data")

1. Use the appropriate functions to obtain descriptive information about the variables included in the dataset (paste or include a screenshot with the resulting information)

# Displays names of the variables
names(Parkinsons) 
 [1] "subject."      "age"           "sex"           "test_time"     "motor_UPDRS"   "total_UPDRS"  
 [7] "Jitter..."     "Jitter.Abs."   "Jitter.RAP"    "Jitter.PPQ5"   "Jitter.DDP"    "Shimmer"      
[13] "Shimmer.dB."   "Shimmer.APQ3"  "Shimmer.APQ5"  "Shimmer.APQ11" "Shimmer.DDA"   "NHR"          
[19] "HNR"           "RPDE"          "DFA"           "PPE"          
# Displays dimension of the dataframe
dim(Parkinsons) 
[1] 5875   22
# Descriptive information about the variables included in the dataset
summary(Parkinsons)
    subject.          age            sex           test_time        motor_UPDRS      total_UPDRS   
 Min.   : 1.00   Min.   :36.0   Min.   :0.0000   Min.   : -4.263   Min.   : 5.038   Min.   : 7.00  
 1st Qu.:10.00   1st Qu.:58.0   1st Qu.:0.0000   1st Qu.: 46.847   1st Qu.:15.000   1st Qu.:21.37  
 Median :22.00   Median :65.0   Median :0.0000   Median : 91.523   Median :20.871   Median :27.58  
 Mean   :21.49   Mean   :64.8   Mean   :0.3178   Mean   : 92.864   Mean   :21.296   Mean   :29.02  
 3rd Qu.:33.00   3rd Qu.:72.0   3rd Qu.:1.0000   3rd Qu.:138.445   3rd Qu.:27.596   3rd Qu.:36.40  
 Max.   :42.00   Max.   :85.0   Max.   :1.0000   Max.   :215.490   Max.   :39.511   Max.   :54.99  
   Jitter...         Jitter.Abs.          Jitter.RAP        Jitter.PPQ5         Jitter.DDP      
 Min.   :0.000830   Min.   :2.250e-06   Min.   :0.000330   Min.   :0.000430   Min.   :0.000980  
 1st Qu.:0.003580   1st Qu.:2.244e-05   1st Qu.:0.001580   1st Qu.:0.001820   1st Qu.:0.004730  
 Median :0.004900   Median :3.453e-05   Median :0.002250   Median :0.002490   Median :0.006750  
 Mean   :0.006154   Mean   :4.403e-05   Mean   :0.002987   Mean   :0.003277   Mean   :0.008962  
 3rd Qu.:0.006800   3rd Qu.:5.333e-05   3rd Qu.:0.003290   3rd Qu.:0.003460   3rd Qu.:0.009870  
 Max.   :0.099990   Max.   :4.456e-04   Max.   :0.057540   Max.   :0.069560   Max.   :0.172630  
    Shimmer         Shimmer.dB.     Shimmer.APQ3      Shimmer.APQ5     Shimmer.APQ11    
 Min.   :0.00306   Min.   :0.026   Min.   :0.00161   Min.   :0.00194   Min.   :0.00249  
 1st Qu.:0.01912   1st Qu.:0.175   1st Qu.:0.00928   1st Qu.:0.01079   1st Qu.:0.01566  
 Median :0.02751   Median :0.253   Median :0.01370   Median :0.01594   Median :0.02271  
 Mean   :0.03404   Mean   :0.311   Mean   :0.01716   Mean   :0.02014   Mean   :0.02748  
 3rd Qu.:0.03975   3rd Qu.:0.365   3rd Qu.:0.02057   3rd Qu.:0.02375   3rd Qu.:0.03272  
 Max.   :0.26863   Max.   :2.107   Max.   :0.16267   Max.   :0.16702   Max.   :0.27546  
  Shimmer.DDA           NHR                HNR              RPDE             DFA        
 Min.   :0.00484   Min.   :0.000286   Min.   : 1.659   Min.   :0.1510   Min.   :0.5140  
 1st Qu.:0.02783   1st Qu.:0.010955   1st Qu.:19.406   1st Qu.:0.4698   1st Qu.:0.5962  
 Median :0.04111   Median :0.018448   Median :21.920   Median :0.5423   Median :0.6436  
 Mean   :0.05147   Mean   :0.032120   Mean   :21.680   Mean   :0.5415   Mean   :0.6532  
 3rd Qu.:0.06173   3rd Qu.:0.031463   3rd Qu.:24.444   3rd Qu.:0.6140   3rd Qu.:0.7113  
 Max.   :0.48802   Max.   :0.748260   Max.   :37.875   Max.   :0.9661   Max.   :0.8656  
      PPE         
 Min.   :0.02198  
 1st Qu.:0.15634  
 Median :0.20550  
 Mean   :0.21959  
 3rd Qu.:0.26449  
 Max.   :0.73173  

2. Calculate the correlation between the different attributes (include the figure produced by R in your answer)

# Correlation
correlation = cor(Parkinsons)
correlation
                   subject.          age           sex     test_time motor_UPDRS total_UPDRS   Jitter...
subject.       1.0000000000 -0.030863612  0.2868514199 -0.0008815743  0.25291853  0.25364275  0.13544752
age           -0.0308636122  1.000000000 -0.0416017291  0.0198838435  0.27366476  0.31028993  0.02307118
sex            0.2868514199 -0.041601729  1.0000000000 -0.0098049838 -0.03120501 -0.09655888  0.05142162
test_time     -0.0008815743  0.019883844 -0.0098049838  1.0000000000  0.06791826  0.07526266 -0.02283709
motor_UPDRS    0.2529185298  0.273664760 -0.0312050144  0.0679182641  1.00000000  0.94723131  0.08481576
total_UPDRS    0.2536427490  0.310289929 -0.0965588806  0.0752626604  0.94723131  1.00000000  0.07424667
Jitter...      0.1354475184  0.023071181  0.0514216175 -0.0228370926  0.08481576  0.07424667  1.00000000
Jitter.Abs.    0.0751561345  0.035691340 -0.1546453007 -0.0113648117  0.05090328  0.06692673  0.86557722
Jitter.RAP     0.1203393232  0.010254988  0.0767182203 -0.0288878317  0.07268353  0.06401542  0.98418075
Jitter.PPQ5    0.1364738360  0.013199367  0.0879947680 -0.0232899083  0.07629087  0.06335178  0.96821443
Jitter.DDP     0.1203500584  0.010257836  0.0767031684 -0.0288759827  0.07269792  0.06402746  0.98418354
Shimmer        0.1462017730  0.101553856  0.0587357861 -0.0338701798  0.10234870  0.09214091  0.70979112
Shimmer.dB.    0.1428639729  0.111129664  0.0564805319 -0.0309624121  0.11007600  0.09878973  0.71670399
Shimmer.APQ3   0.1129497993  0.098912301  0.0449371995 -0.0290196929  0.08426056  0.07936272  0.66414874
Shimmer.APQ5   0.1382636007  0.089982893  0.0648192972 -0.0365044263  0.09210517  0.08346725  0.69400164
Shimmer.APQ11  0.1733326282  0.135237944  0.0233598626 -0.0391096958  0.13656029  0.12083750  0.64596519
Shimmer.DDA    0.1129486657  0.098913123  0.0449375945 -0.0290168593  0.08426039  0.07936324  0.66414746
NHR            0.1687433623  0.007092699  0.1681695195 -0.0263570332  0.07496727  0.06095164  0.82529366
HNR           -0.2069286890 -0.104842069 -0.0001671123  0.0365448637 -0.15702858 -0.16211683 -0.67518824
RPDE           0.1473003405  0.090208319 -0.1592624409 -0.0388869742  0.12860740  0.15689651  0.42712754
DFA            0.0974642595 -0.092870159 -0.1651134712  0.0192608786 -0.11624248 -0.11347483  0.22654994
PPE            0.1575592025  0.120789753 -0.0999006846 -0.0005633701  0.16243297  0.15619488  0.72184881
              Jitter.Abs.  Jitter.RAP Jitter.PPQ5  Jitter.DDP     Shimmer Shimmer.dB. Shimmer.APQ3
subject.       0.07515613  0.12033932  0.13647384  0.12035006  0.14620177  0.14286397   0.11294980
age            0.03569134  0.01025499  0.01319937  0.01025784  0.10155386  0.11112966   0.09891230
sex           -0.15464530  0.07671822  0.08799477  0.07670317  0.05873579  0.05648053   0.04493720
test_time     -0.01136481 -0.02888783 -0.02328991 -0.02887598 -0.03387018 -0.03096241  -0.02901969
motor_UPDRS    0.05090328  0.07268353  0.07629087  0.07269792  0.10234870  0.11007600   0.08426056
total_UPDRS    0.06692673  0.06401542  0.06335178  0.06402746  0.09214091  0.09878973   0.07936272
Jitter...      0.86557722  0.98418075  0.96821443  0.98418354  0.70979112  0.71670399   0.66414874
Jitter.Abs.    1.00000000  0.84462628  0.79053765  0.84463035  0.64904638  0.65587068   0.62382984
Jitter.RAP     0.84462628  1.00000000  0.94719593  0.99999962  0.68172901  0.68555054   0.65022614
Jitter.PPQ5    0.79053765  0.94719593  1.00000000  0.94720256  0.73274748  0.73459079   0.67671149
Jitter.DDP     0.84463035  0.99999962  0.94720256  1.00000000  0.68173376  0.68555613   0.65022816
Shimmer        0.64904638  0.68172901  0.73274748  0.68173376  1.00000000  0.99233407   0.97982804
Shimmer.dB.    0.65587068  0.68555054  0.73459079  0.68555613  0.99233407  1.00000000   0.96801480
Shimmer.APQ3   0.62382984  0.65022614  0.67671149  0.65022816  0.97982804  0.96801480   1.00000000
Shimmer.APQ5   0.62140081  0.65983121  0.73402075  0.65983319  0.98490432  0.97637257   0.96272296
Shimmer.APQ11  0.58999842  0.60308168  0.66841348  0.60309033  0.93545684  0.93633812   0.88569537
Shimmer.DDA    0.62382750  0.65022465  0.67671017  0.65022667  0.97982731  0.96801427   0.99999998
NHR            0.69995990  0.79237273  0.86486425  0.79237731  0.79515848  0.79807697   0.73273634
HNR           -0.70641805 -0.64147280 -0.66240886 -0.64148177 -0.80141600 -0.80249646  -0.78069689
RPDE           0.54709960  0.38289088  0.38150298  0.38288580  0.46823455  0.47240859   0.43687810
DFA            0.35226386  0.21488132  0.17535854  0.21489299  0.13253994  0.12611117   0.13073500
PPE            0.78785284  0.67065210  0.66349144  0.67066035  0.61570856  0.63516268   0.57670395
              Shimmer.APQ5 Shimmer.APQ11 Shimmer.DDA          NHR           HNR        RPDE         DFA
subject.        0.13826360    0.17333263  0.11294867  0.168743362 -0.2069286890  0.14730034  0.09746426
age             0.08998289    0.13523794  0.09891312  0.007092699 -0.1048420689  0.09020832 -0.09287016
sex             0.06481930    0.02335986  0.04493759  0.168169520 -0.0001671123 -0.15926244 -0.16511347
test_time      -0.03650443   -0.03910970 -0.02901686 -0.026357033  0.0365448637 -0.03888697  0.01926088
motor_UPDRS     0.09210517    0.13656029  0.08426039  0.074967270 -0.1570285788  0.12860740 -0.11624248
total_UPDRS     0.08346725    0.12083750  0.07936324  0.060951644 -0.1621168287  0.15689651 -0.11347483
Jitter...       0.69400164    0.64596519  0.66414746  0.825293655 -0.6751882442  0.42712754  0.22654994
Jitter.Abs.     0.62140081    0.58999842  0.62382750  0.699959896 -0.7064180505  0.54709960  0.35226386
Jitter.RAP      0.65983121    0.60308168  0.65022465  0.792372728 -0.6414728036  0.38289088  0.21488132
Jitter.PPQ5     0.73402075    0.66841348  0.67671017  0.864864252 -0.6624088579  0.38150298  0.17535854
Jitter.DDP      0.65983319    0.60309033  0.65022667  0.792377310 -0.6414817715  0.38288580  0.21489299
Shimmer         0.98490432    0.93545684  0.97982731  0.795158485 -0.8014160019  0.46823455  0.13253994
Shimmer.dB.     0.97637257    0.93633812  0.96801427  0.798076972 -0.8024964615  0.47240859  0.12611117
Shimmer.APQ3    0.96272296    0.88569537  0.99999998  0.732736344 -0.7806968895  0.43687810  0.13073500
Shimmer.APQ5    1.00000000    0.93893494  0.96272308  0.798173148 -0.7906382164  0.45088990  0.12803754
Shimmer.APQ11   0.93893494    1.00000000  0.88569414  0.711546170 -0.7779743467  0.48073856  0.17964765
Shimmer.DDA     0.96272308    0.88569414  1.00000000  0.732733983 -0.7806962950  0.43687244  0.13073592
NHR             0.79817315    0.71154617  0.73273398  1.000000000 -0.6844118571  0.41665964 -0.02208778
HNR            -0.79063822   -0.77797435 -0.78069630 -0.684411857  1.0000000000 -0.65905315 -0.29051945
RPDE            0.45088990    0.48073856  0.43687244  0.416659644 -0.6590531523  1.00000000  0.19203007
DFA             0.12803754    0.17964765  0.13073592 -0.022087779 -0.2905194517  0.19203007  1.00000000
PPE             0.59367655    0.62341606  0.57670220  0.564654472 -0.7587222059  0.56606485  0.39464966
                        PPE
subject.       0.1575592025
age            0.1207897526
sex           -0.0999006846
test_time     -0.0005633701
motor_UPDRS    0.1624329732
total_UPDRS    0.1561948752
Jitter...      0.7218488137
Jitter.Abs.    0.7878528397
Jitter.RAP     0.6706520982
Jitter.PPQ5    0.6634914441
Jitter.DDP     0.6706603464
Shimmer        0.6157085590
Shimmer.dB.    0.6351626782
Shimmer.APQ3   0.5767039508
Shimmer.APQ5   0.5936765462
Shimmer.APQ11  0.6234160550
Shimmer.DDA    0.5767021962
NHR            0.5646544721
HNR           -0.7587222059
RPDE           0.5660648549
DFA            0.3946496554
PPE            1.0000000000
# Create figure
pdf("Correlation-Figure.pdf")
pairs(correlation)
dev.off()
null device 
          1 

3. Divide the input dataset into training and testing

a. Split the datasets using 80% for training and 20% for testing

dt = sort(sample(nrow(Parkinsons), nrow(Parkinsons)*.8))
train = Parkinsons[dt,] # 4700 obs out of 5875 (or 80 %)
test = Parkinsons[-dt,] # 1175 obs out of 5875 (or 20 %)

# Training
train
# Testing
test

b. How many examples will be used for training and how many for testing?

For training 4700 examples out of 5875 and for testing 1175 examples out of 5875

4. Build a multiple linear regression model containing all the input variables to predict total_UPDRS

# Multiple linear regression model
model = lm(total_UPDRS ~.-motor_UPDRS-total_UPDRS, data = train)
summary(model)

Call:
lm(formula = total_UPDRS ~ . - motor_UPDRS - total_UPDRS, data = train)

Residuals:
    Min      1Q  Median      3Q     Max 
-27.452  -6.683  -1.280   7.076  23.700 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)    3.386e+01  3.429e+00   9.874  < 2e-16 ***
subject.       2.681e-01  1.207e-02  22.211  < 2e-16 ***
age            3.210e-01  1.604e-02  20.020  < 2e-16 ***
sex           -4.606e+00  3.492e-01 -13.189  < 2e-16 ***
test_time      1.611e-02  2.545e-03   6.328 2.72e-10 ***
Jitter...     -3.691e+02  2.283e+02  -1.616 0.106063    
Jitter.Abs.   -3.957e+04  1.068e+04  -3.705 0.000214 ***
Jitter.RAP    -1.687e+04  4.981e+04  -0.339 0.734830    
Jitter.PPQ5   -1.442e+02  1.998e+02  -0.722 0.470580    
Jitter.DDP     6.039e+03  1.660e+04   0.364 0.716098    
Shimmer        2.560e+01  6.811e+01   0.376 0.707056    
Shimmer.dB.    5.876e-01  5.201e+00   0.113 0.910064    
Shimmer.APQ3  -5.512e+04  5.012e+04  -1.100 0.271474    
Shimmer.APQ5   3.647e+01  6.013e+01   0.607 0.544179    
Shimmer.APQ11  8.811e+00  2.606e+01   0.338 0.735321    
Shimmer.DDA    1.831e+04  1.671e+04   1.096 0.273047    
NHR           -2.209e+01  6.709e+00  -3.293 0.000999 ***
HNR           -4.831e-01  7.375e-02  -6.550 6.36e-11 ***
RPDE           1.612e+00  1.957e+00   0.824 0.410054    
DFA           -3.524e+01  2.469e+00 -14.274  < 2e-16 ***
PPE            1.667e+01  3.111e+00   5.357 8.89e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 9.269 on 4679 degrees of freedom
Multiple R-squared:  0.2524,    Adjusted R-squared:  0.2492 
F-statistic: 78.99 on 20 and 4679 DF,  p-value: < 2.2e-16

a. Which predictors have a significant impact in the prediction?

Subject#, age, sex, test_time, Jitter.Abs., NHR , HNR, DFA and PPE seem like they have a significant impact in the prediction because they all have low p-values. A low p-value (significant) is likely to be a meaningful addition to the model because changes in the predictor’s value are related to changes in the response variable. A larger (insignificant) p-value suggests that changes in the predictor are not associated with changes in the response.

b. How does the model perform? Provide the R2 and RSE

We can see that R-Squared is low, which means that the model does not explain very well the variation in the response variable around its mean. On the other hand, we used the RSE (gives a measure of error of prediction) to calculate the error rate of the model by dividing it by the mean of the outcome variable (total_UPDRS). The error rate is somewhat high (almost 32%) and ideally for the model to perform well, the lower the error rate is, the better.

#R^2 (how well the regression model fits the observed data)
summary(model)$r.sq 
[1] 0.2524219
#RSE (average deviation between the actual outcome and the true regression line)
summary(model)$sigma 
[1] 9.269164
# Error rate (estimated by dividing the RSE by the mean outcome variable)
summary(model)$sigma/mean(train$total_UPDRS)
[1] 0.3203573

Graduate-student part that I wanted to try out

5. Build a multiple linear regression model to predict total_UPDRS including an interaction term that you consider may be relevant based on the results obtained in (c)

# Formula (subject and age as interaction term)
interaction.model = lm(total_UPDRS ~.-motor_UPDRS-total_UPDRS+subject.*age, data = train)
summary(interaction.model)

Call:
lm(formula = total_UPDRS ~ . - motor_UPDRS - total_UPDRS + subject. * 
    age, data = train)

Residuals:
    Min      1Q  Median      3Q     Max 
-27.698  -6.669  -1.205   6.832  22.725 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)    8.103e+00  4.142e+00   1.956  0.05050 .  
subject.       1.276e+00  9.400e-02  13.571  < 2e-16 ***
age            7.093e-01  3.927e-02  18.063  < 2e-16 ***
sex           -4.890e+00  3.460e-01 -14.135  < 2e-16 ***
test_time      1.691e-02  2.516e-03   6.724 1.98e-11 ***
Jitter...     -3.163e+02  2.256e+02  -1.402  0.16101    
Jitter.Abs.   -4.335e+04  1.056e+04  -4.106 4.10e-05 ***
Jitter.RAP    -1.352e+04  4.920e+04  -0.275  0.78344    
Jitter.PPQ5   -2.541e+02  1.976e+02  -1.286  0.19860    
Jitter.DDP     4.944e+03  1.640e+04   0.301  0.76311    
Shimmer        2.842e+01  6.728e+01   0.422  0.67276    
Shimmer.dB.    1.829e-01  5.138e+00   0.036  0.97161    
Shimmer.APQ3  -7.370e+04  4.954e+04  -1.488  0.13690    
Shimmer.APQ5   4.588e+01  5.941e+01   0.772  0.43993    
Shimmer.APQ11  7.299e+00  2.575e+01   0.284  0.77680    
Shimmer.DDA    2.450e+04  1.651e+04   1.484  0.13791    
NHR           -2.136e+01  6.628e+00  -3.222  0.00128 ** 
HNR           -4.837e-01  7.286e-02  -6.640 3.51e-11 ***
RPDE          -2.102e+00  1.964e+00  -1.070  0.28453    
DFA           -3.257e+01  2.452e+00 -13.284  < 2e-16 ***
PPE            1.821e+01  3.077e+00   5.917 3.51e-09 ***
subject.:age  -1.516e-02  1.403e-03 -10.806  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 9.157 on 4678 degrees of freedom
Multiple R-squared:  0.2706,    Adjusted R-squared:  0.2674 
F-statistic: 82.65 on 21 and 4678 DF,  p-value: < 2.2e-16
# Results (improved a lil bit)
summary(interaction.model)$r.sq 
[1] 0.2706283
summary(interaction.model)$sigma
[1] 9.156578
summary(interaction.model)$sigma/mean(train$total_UPDRS)
[1] 0.3164661

6. Build a regression model which includes non-linear transformations of predictors

# Formula (includes non-linear transformations of predictors.)
transformation.model = lm(total_UPDRS ~.-motor_UPDRS-total_UPDRS+subject.*age+I(age^2)+I(subject.^2)+I(subject.*age^2), data = train)
# Results (better)
summary(transformation.model)

Call:
lm(formula = total_UPDRS ~ . - motor_UPDRS - total_UPDRS + subject. * 
    age + I(age^2) + I(subject.^2) + I(subject. * age^2), data = train)

Residuals:
    Min      1Q  Median      3Q     Max 
-22.042  -5.858  -1.470   4.662  22.941 

Coefficients:
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)          5.457e+02  2.861e+01  19.072  < 2e-16 ***
subject.            -1.721e+01  8.885e-01 -19.367  < 2e-16 ***
age                 -1.585e+01  8.772e-01 -18.070  < 2e-16 ***
sex                 -2.426e+00  3.335e-01  -7.272 4.12e-13 ***
test_time            1.958e-02  2.325e-03   8.418  < 2e-16 ***
Jitter...           -6.576e+02  2.087e+02  -3.151  0.00164 ** 
Jitter.Abs.          1.390e+03  9.880e+03   0.141  0.88812    
Jitter.RAP          -5.120e+03  4.542e+04  -0.113  0.91024    
Jitter.PPQ5         -6.666e+01  1.825e+02  -0.365  0.71498    
Jitter.DDP           2.223e+03  1.514e+04   0.147  0.88328    
Shimmer             -1.225e+01  6.216e+01  -0.197  0.84372    
Shimmer.dB.          4.220e+00  4.747e+00   0.889  0.37404    
Shimmer.APQ3        -9.261e+04  4.574e+04  -2.025  0.04294 *  
Shimmer.APQ5         1.393e+02  5.494e+01   2.535  0.01127 *  
Shimmer.APQ11       -7.494e+00  2.381e+01  -0.315  0.75292    
Shimmer.DDA          3.078e+04  1.525e+04   2.019  0.04356 *  
NHR                 -3.345e+01  6.183e+00  -5.411 6.60e-08 ***
HNR                 -4.263e-01  6.737e-02  -6.327 2.73e-10 ***
RPDE                 2.688e-01  1.821e+00   0.148  0.88269    
DFA                 -2.259e+01  2.339e+00  -9.659  < 2e-16 ***
PPE                  6.750e+00  2.873e+00   2.350  0.01883 *  
I(age^2)             1.256e-01  6.676e-03  18.817  < 2e-16 ***
I(subject.^2)        1.924e-02  1.007e-03  19.103  < 2e-16 ***
I(subject. * age^2) -4.070e-03  2.032e-04 -20.029  < 2e-16 ***
subject.:age         5.241e-01  2.699e-02  19.418  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 8.452 on 4675 degrees of freedom
Multiple R-squared:  0.379, Adjusted R-squared:  0.3758 
F-statistic: 118.9 on 24 and 4675 DF,  p-value: < 2.2e-16
summary(transformation.model)$r.sq 
[1] 0.3789984
summary(transformation.model)$sigma
[1] 8.451706
summary(transformation.model)$sigma/mean(train$total_UPDRS)
[1] 0.2921046

7. Provide diagnostic plots for all the models built and comment on whether the models are appropriate based on what these plots show

# Checking whether a model is a better fit 
# H0: The two models fit the data equally well
# H1: The full model is superior
anova(model,transformation.model)
Analysis of Variance Table

Model 1: total_UPDRS ~ (subject. + age + sex + test_time + motor_UPDRS + 
    Jitter... + Jitter.Abs. + Jitter.RAP + Jitter.PPQ5 + Jitter.DDP + 
    Shimmer + Shimmer.dB. + Shimmer.APQ3 + Shimmer.APQ5 + Shimmer.APQ11 + 
    Shimmer.DDA + NHR + HNR + RPDE + DFA + PPE) - motor_UPDRS - 
    total_UPDRS
Model 2: total_UPDRS ~ (subject. + age + sex + test_time + motor_UPDRS + 
    Jitter... + Jitter.Abs. + Jitter.RAP + Jitter.PPQ5 + Jitter.DDP + 
    Shimmer + Shimmer.dB. + Shimmer.APQ3 + Shimmer.APQ5 + Shimmer.APQ11 + 
    Shimmer.DDA + NHR + HNR + RPDE + DFA + PPE) - motor_UPDRS - 
    total_UPDRS + subject. * age + I(age^2) + I(subject.^2) + 
    I(subject. * age^2)
  Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
1   4679 402008                                  
2   4675 333941  4     68066 238.22 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Diagnostic plots
plot(model)

# Using interaction term
plot(interaction.model)

# Using non-linear transformations
plot(transformation.model)

LS0tCnRpdGxlOiAiQXNzaWdubWVudCAxIgphdXRob3I6ICJHYWJyaWVsYSBTZXJyYW5vIEVjaGVuYWd1Y2lhIgpkYXRlOiAiMjAyMiBGZWJydWFyeSA5IgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCiZuYnNwOwo8Y2VudGVyPiA8aDE+PGI+TGluZWFyIFJlZ3Jlc3Npb248L2I+PC9oMT4gPC9jZW50ZXI+CgpgYGB7cn0KIyBMb2FkIHRoZSBkYXRhc2V0ICJwYXJraW5zb25zX3VwZHJzIgpQYXJraW5zb25zID0gcmVhZC5jc3YoIn4vTGlicmFyeS9Nb2JpbGUgRG9jdW1lbnRzL2NvbX5hcHBsZX5DbG91ZERvY3MvRGVza3RvcC9Vbml2IE1pYW1pLzZ0aCBTZW1lc3RlciAtIFNQUklORyAyMDIyL0NTQyA1OTcvQXNzaWdubWVudHMvRGF0YXNldHMvUmVncmVzc2lvbi9wYXJraW5zb25zX3VwZHJzLmRhdGEiKQpgYGAKCiMjIDEuIFVzZSB0aGUgYXBwcm9wcmlhdGUgZnVuY3Rpb25zIHRvIG9idGFpbiBkZXNjcmlwdGl2ZSBpbmZvcm1hdGlvbiBhYm91dCB0aGUgdmFyaWFibGVzIGluY2x1ZGVkIGluIHRoZSBkYXRhc2V0IChwYXN0ZSBvciBpbmNsdWRlIGEgc2NyZWVuc2hvdCB3aXRoIHRoZSByZXN1bHRpbmcgaW5mb3JtYXRpb24pCgpgYGB7cn0KIyBEaXNwbGF5cyBuYW1lcyBvZiB0aGUgdmFyaWFibGVzCm5hbWVzKFBhcmtpbnNvbnMpIAojIERpc3BsYXlzIGRpbWVuc2lvbiBvZiB0aGUgZGF0YWZyYW1lCmRpbShQYXJraW5zb25zKSAKIyBEZXNjcmlwdGl2ZSBpbmZvcm1hdGlvbiBhYm91dCB0aGUgdmFyaWFibGVzIGluY2x1ZGVkIGluIHRoZSBkYXRhc2V0CnN1bW1hcnkoUGFya2luc29ucykKYGBgCgojIyAyLiBDYWxjdWxhdGUgdGhlIGNvcnJlbGF0aW9uIGJldHdlZW4gdGhlIGRpZmZlcmVudCBhdHRyaWJ1dGVzIChpbmNsdWRlIHRoZSBmaWd1cmUgcHJvZHVjZWQgYnkgUiBpbiB5b3VyIGFuc3dlcikKCmBgYHtyfQojIENvcnJlbGF0aW9uCmNvcnJlbGF0aW9uID0gY29yKFBhcmtpbnNvbnMpCmNvcnJlbGF0aW9uCmBgYAoKYGBge3J9CiMgQ3JlYXRlIGZpZ3VyZQpwZGYoIkNvcnJlbGF0aW9uLUZpZ3VyZS5wZGYiKQpwYWlycyhjb3JyZWxhdGlvbikKZGV2Lm9mZigpCmBgYAoKCgojIyAzLiBEaXZpZGUgdGhlIGlucHV0IGRhdGFzZXQgaW50byB0cmFpbmluZyBhbmQgdGVzdGluZwoKIyMjIGEuIFNwbGl0IHRoZSBkYXRhc2V0cyB1c2luZyA4MCUgZm9yIHRyYWluaW5nIGFuZCAyMCUgZm9yIHRlc3RpbmcKYGBge3J9CmR0ID0gc29ydChzYW1wbGUobnJvdyhQYXJraW5zb25zKSwgbnJvdyhQYXJraW5zb25zKSouOCkpCnRyYWluID0gUGFya2luc29uc1tkdCxdICMgNDcwMCBvYnMgb3V0IG9mIDU4NzUgKG9yIDgwICUpCnRlc3QgPSBQYXJraW5zb25zWy1kdCxdICMgMTE3NSBvYnMgb3V0IG9mIDU4NzUgKG9yIDIwICUpCgojIFRyYWluaW5nCnRyYWluCiMgVGVzdGluZwp0ZXN0CmBgYAoKIyMjIGIuIEhvdyBtYW55IGV4YW1wbGVzIHdpbGwgYmUgdXNlZCBmb3IgdHJhaW5pbmcgYW5kIGhvdyBtYW55IGZvciB0ZXN0aW5nPwpGb3IgdHJhaW5pbmcgNDcwMCBleGFtcGxlcyBvdXQgb2YgNTg3NSBhbmQgZm9yIHRlc3RpbmcgMTE3NSBleGFtcGxlcyBvdXQgb2YgNTg3NQoKIyMgNC4gQnVpbGQgYSBtdWx0aXBsZSBsaW5lYXIgcmVncmVzc2lvbiBtb2RlbCBjb250YWluaW5nIGFsbCB0aGUgaW5wdXQgdmFyaWFibGVzIHRvIHByZWRpY3QgdG90YWxfVVBEUlMKYGBge3J9CiMgTXVsdGlwbGUgbGluZWFyIHJlZ3Jlc3Npb24gbW9kZWwKbW9kZWwgPSBsbSh0b3RhbF9VUERSUyB+Li1tb3Rvcl9VUERSUy10b3RhbF9VUERSUywgZGF0YSA9IHRyYWluKQpzdW1tYXJ5KG1vZGVsKQpgYGAKICAKIyMjIGEuIFdoaWNoIHByZWRpY3RvcnMgaGF2ZSBhIHNpZ25pZmljYW50IGltcGFjdCBpbiB0aGUgcHJlZGljdGlvbj8KU3ViamVjdCMsIGFnZSwgc2V4LCB0ZXN0X3RpbWUsIEppdHRlci5BYnMuLCBOSFIgLCBITlIsIERGQSBhbmQgUFBFIHNlZW0gbGlrZSB0aGV5IGhhdmUgYSBzaWduaWZpY2FudCBpbXBhY3QgaW4gdGhlIHByZWRpY3Rpb24gYmVjYXVzZSB0aGV5IGFsbCBoYXZlIGxvdyBwLXZhbHVlcy4gQSBsb3cgcC12YWx1ZSAoc2lnbmlmaWNhbnQpIGlzIGxpa2VseSB0byBiZSBhIG1lYW5pbmdmdWwgYWRkaXRpb24gdG8gdGhlIG1vZGVsIGJlY2F1c2UgY2hhbmdlcyBpbiB0aGUgcHJlZGljdG9yJ3MgdmFsdWUgYXJlIHJlbGF0ZWQgdG8gY2hhbmdlcyBpbiB0aGUgcmVzcG9uc2UgdmFyaWFibGUuIEEgbGFyZ2VyIChpbnNpZ25pZmljYW50KSBwLXZhbHVlIHN1Z2dlc3RzIHRoYXQgY2hhbmdlcyBpbiB0aGUgcHJlZGljdG9yIGFyZSBub3QgYXNzb2NpYXRlZCB3aXRoIGNoYW5nZXMgaW4gdGhlIHJlc3BvbnNlLgogIAojIyMgYi4gSG93IGRvZXMgdGhlIG1vZGVsIHBlcmZvcm0/IFByb3ZpZGUgdGhlIFIyIGFuZCBSU0UKV2UgY2FuIHNlZSB0aGF0IFItU3F1YXJlZCBpcyBsb3csIHdoaWNoIG1lYW5zIHRoYXQgdGhlIG1vZGVsIGRvZXMgbm90IGV4cGxhaW4gdmVyeSB3ZWxsIHRoZSB2YXJpYXRpb24gaW4gdGhlIHJlc3BvbnNlIHZhcmlhYmxlIGFyb3VuZCBpdHMgbWVhbi4gT24gdGhlIG90aGVyIGhhbmQsIHdlIHVzZWQgdGhlIFJTRSAoZ2l2ZXMgYSBtZWFzdXJlIG9mIGVycm9yIG9mIHByZWRpY3Rpb24pIHRvIGNhbGN1bGF0ZSB0aGUgZXJyb3IgcmF0ZSBvZiB0aGUgbW9kZWwgYnkgZGl2aWRpbmcgaXQgYnkgdGhlIG1lYW4gb2YgdGhlIG91dGNvbWUgdmFyaWFibGUgKHRvdGFsX1VQRFJTKS4gVGhlIGVycm9yIHJhdGUgaXMgc29tZXdoYXQgaGlnaCAoYWxtb3N0IDMyJSkgYW5kIGlkZWFsbHkgZm9yIHRoZSBtb2RlbCB0byBwZXJmb3JtIHdlbGwsIHRoZSBsb3dlciB0aGUgZXJyb3IgcmF0ZSBpcywgdGhlIGJldHRlci4KYGBge3J9CiNSXjIgKGhvdyB3ZWxsIHRoZSByZWdyZXNzaW9uIG1vZGVsIGZpdHMgdGhlIG9ic2VydmVkIGRhdGEpCnN1bW1hcnkobW9kZWwpJHIuc3EgCgojUlNFIChhdmVyYWdlIGRldmlhdGlvbiBiZXR3ZWVuIHRoZSBhY3R1YWwgb3V0Y29tZSBhbmQgdGhlIHRydWUgcmVncmVzc2lvbiBsaW5lKQpzdW1tYXJ5KG1vZGVsKSRzaWdtYSAKCiMgRXJyb3IgcmF0ZSAoZXN0aW1hdGVkIGJ5IGRpdmlkaW5nIHRoZSBSU0UgYnkgdGhlIG1lYW4gb3V0Y29tZSB2YXJpYWJsZSkKc3VtbWFyeShtb2RlbCkkc2lnbWEvbWVhbih0cmFpbiR0b3RhbF9VUERSUykKYGBgCgojIEdyYWR1YXRlLXN0dWRlbnQgcGFydCB0aGF0IEkgd2FudGVkIHRvIHRyeSBvdXQgIAoKIyMgNS4gQnVpbGQgYSBtdWx0aXBsZSBsaW5lYXIgcmVncmVzc2lvbiBtb2RlbCB0byBwcmVkaWN0IHRvdGFsX1VQRFJTIGluY2x1ZGluZyBhbiBpbnRlcmFjdGlvbiB0ZXJtIHRoYXQgeW91IGNvbnNpZGVyIG1heSBiZSByZWxldmFudCBiYXNlZCBvbiB0aGUgcmVzdWx0cyBvYnRhaW5lZCBpbiAoYykKYGBge3J9CiMgRm9ybXVsYSAoc3ViamVjdCBhbmQgYWdlIGFzIGludGVyYWN0aW9uIHRlcm0pCmludGVyYWN0aW9uLm1vZGVsID0gbG0odG90YWxfVVBEUlMgfi4tbW90b3JfVVBEUlMtdG90YWxfVVBEUlMrc3ViamVjdC4qYWdlLCBkYXRhID0gdHJhaW4pCnN1bW1hcnkoaW50ZXJhY3Rpb24ubW9kZWwpCiMgUmVzdWx0cyAoaW1wcm92ZWQgYSBsaWwgYml0KQpzdW1tYXJ5KGludGVyYWN0aW9uLm1vZGVsKSRyLnNxIApzdW1tYXJ5KGludGVyYWN0aW9uLm1vZGVsKSRzaWdtYQpzdW1tYXJ5KGludGVyYWN0aW9uLm1vZGVsKSRzaWdtYS9tZWFuKHRyYWluJHRvdGFsX1VQRFJTKQpgYGAKCiMjIDYuIEJ1aWxkIGEgcmVncmVzc2lvbiBtb2RlbCB3aGljaCBpbmNsdWRlcyBub24tbGluZWFyIHRyYW5zZm9ybWF0aW9ucyBvZiBwcmVkaWN0b3JzCmBgYHtyfQojIEZvcm11bGEgKGluY2x1ZGVzIG5vbi1saW5lYXIgdHJhbnNmb3JtYXRpb25zIG9mIHByZWRpY3RvcnMuKQp0cmFuc2Zvcm1hdGlvbi5tb2RlbCA9IGxtKHRvdGFsX1VQRFJTIH4uLW1vdG9yX1VQRFJTLXRvdGFsX1VQRFJTK3N1YmplY3QuKmFnZStJKGFnZV4yKStJKHN1YmplY3QuXjIpK0koc3ViamVjdC4qYWdlXjIpLCBkYXRhID0gdHJhaW4pCiMgUmVzdWx0cyAoYmV0dGVyKQpzdW1tYXJ5KHRyYW5zZm9ybWF0aW9uLm1vZGVsKQpzdW1tYXJ5KHRyYW5zZm9ybWF0aW9uLm1vZGVsKSRyLnNxIApzdW1tYXJ5KHRyYW5zZm9ybWF0aW9uLm1vZGVsKSRzaWdtYQpzdW1tYXJ5KHRyYW5zZm9ybWF0aW9uLm1vZGVsKSRzaWdtYS9tZWFuKHRyYWluJHRvdGFsX1VQRFJTKQpgYGAKCiMjIDcuIFByb3ZpZGUgZGlhZ25vc3RpYyBwbG90cyBmb3IgYWxsIHRoZSBtb2RlbHMgYnVpbHQgYW5kIGNvbW1lbnQgb24gd2hldGhlciB0aGUgbW9kZWxzIGFyZSBhcHByb3ByaWF0ZSBiYXNlZCBvbiB3aGF0IHRoZXNlIHBsb3RzIHNob3cKYGBge3J9CiMgQ2hlY2tpbmcgd2hldGhlciBhIG1vZGVsIGlzIGEgYmV0dGVyIGZpdCAKIyBIMDogVGhlIHR3byBtb2RlbHMgZml0IHRoZSBkYXRhIGVxdWFsbHkgd2VsbAojIEgxOiBUaGUgZnVsbCBtb2RlbCBpcyBzdXBlcmlvcgphbm92YShtb2RlbCx0cmFuc2Zvcm1hdGlvbi5tb2RlbCkKCiMgRGlhZ25vc3RpYyBwbG90cwpwbG90KG1vZGVsKQojIFVzaW5nIGludGVyYWN0aW9uIHRlcm0KcGxvdChpbnRlcmFjdGlvbi5tb2RlbCkKIyBVc2luZyBub24tbGluZWFyIHRyYW5zZm9ybWF0aW9ucwpwbG90KHRyYW5zZm9ybWF0aW9uLm1vZGVsKQpgYGAKCg==